Add Irodori-TTS: Japanese TTS model port to MLX#591
Merged
lucasnewman merged 6 commits intoBlaizzy:mainfrom Mar 24, 2026
Merged
Add Irodori-TTS: Japanese TTS model port to MLX#591lucasnewman merged 6 commits intoBlaizzy:mainfrom
lucasnewman merged 6 commits intoBlaizzy:mainfrom
Conversation
Blaizzy
requested changes
Mar 21, 2026
Owner
Blaizzy
left a comment
There was a problem hiding this comment.
Hey @yoshphys
Thanks for the contribution!
I just have a few nits:
- Please share a audio sample of the port and the source model so we can compare.
- Not needed: models/irodori_tts/convert.py
- Please move tests/test_irodori_tts.py to test_models.py and follow the format there.
- Upload a converted model to mlx-community on Huggingface (4bit, 5bit, 6bit, 8bit, bf16)
Contributor
Author
UpdateTests moved to
|
| Audio | |
|---|---|
| Original (Aratako/Irodori-TTS-500M, PyTorch) | comparison_original.wav |
MLX port (fp16, sequence_length=400, cfg_guidance_mode=alternating) |
comparison_mlx_fp16.wav |
Port Aratako/Irodori-TTS-500M to mlx-audio. The model uses a DiT (Diffusion Transformer) with Rectified Flow sampling and DACVAE codec (48kHz, 128-dim latents). Key components: - IrodoriDiT: JointAttention (self+text+speaker), LowRankAdaLN, SwiGLU - Euler sampler with CFG (independent/alternating/joint modes) - Japanese text normalization + HuggingFace tokenizer (llm-jp/llm-jp-3-150m) - DACVAE codec loaded from facebook/dacvae-watermarked via convert.py - convert.py: converts PyTorch weights to MLX fp16 safetensors - 28 unit tests covering architecture, text processing, sanitize, and smoke tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove models/irodori_tts/convert.py from tracking (reviewer: not needed) - Delete tests/test_irodori_tts.py; move all 26 Irodori-TTS tests into tests/test_models.py following the established format (imports inside test methods, module-level stubs/helpers) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7d8698f to
46767d6
Compare
lucasnewman
reviewed
Mar 23, 2026
lucasnewman
approved these changes
Mar 23, 2026
Collaborator
lucasnewman
left a comment
There was a problem hiding this comment.
See minor comment but looks good to me!
convert.py was removed in a previous commit; the section is no longer needed. Also update DACVAE download note to reflect that weights are fetched automatically on first use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
irodori_tts)"irodori_tts"entry toMODEL_REMAPPINGintts/utils.pyNew files
models/irodori_tts/model.pymodels/irodori_tts/irodori_tts.pymodels/irodori_tts/config.pymodels/irodori_tts/sampling.pymodels/irodori_tts/text.pymodels/irodori_tts/convert.pymodels/irodori_tts/README.mdtests/test_irodori_tts.pyTest plan
python -m unittest mlx_audio.tts.tests.test_irodori_tts -vpython -m mlx_audio.tts.models.irodori_tts.convert(requirestorch)python -m mlx_audio.tts.generate --model ./Irodori-TTS-500M-fp16 --text "こんにちは"sequence_length=300andcfg_guidance_mode=alternatingto stay within memory limits🤖 Generated with Claude Code